Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PHP8.2 str_split function returns empty arrays for empty strings #3341

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HuongNV13
Copy link

@HuongNV13 HuongNV13 commented Feb 3, 2023

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

In PHP 8.2, the str_split function will returns empty arrays for empty strings.
See: https://php.watch/versions/8.2/str_split-empty-string-empty-array

We can use mb_str_split() instead

Copy link
Collaborator

@oleibman oleibman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we are actually subject to any failures as a result of the Php change, but I think mb_substr_split is probably what should be using anyhow (in addition to other mb_ functions in lieu of the straight string equivalents). Can you please add some null string tests to tests\data\Calculation\Engineering\Hex2Dec ([0, '""'],), tests\data\Calculation\Engineering\Oct2Dec (['0', '""'],) and tests\data\Calculation\MathTrig\Arabic ([0, ''],), just to demonstrate that they work as in Excel.

src/PhpSpreadsheet/Reader/Csv/Delimiter.php Outdated Show resolved Hide resolved
@ConnectGrid
Copy link
Contributor

The Mbstring extension is not bundled with every distribution of PHP by default.

If the issue is splitting on an empty string, would it not be better to check for that case prior to calling str_split() (or mb_str_split()), or always ensure that an array is evaluated?

foreach (mb_str_split((array)$value) as $char) { ... }

// or

foreach (mb_str_split($value ?: []) as $char) { ... }

Would there ever be a false-y $value that would need to be acted upon by this function?

@MarkBaker
Copy link
Member

The Mbstring extension is not bundled with every distribution of PHP by default.

It's tagged as a required dependency in composer.json; but we can't control if devs do a composer install with --ignore-platform-reqs; or pulls a phar from one of those 3rd party sites that bundles dependencies, but doesn't do any PHP requirements checking

I'd also like to see some tests to identify any input $value where the change to str_split() could actually cause an issue in those methods

@oleibman
Copy link
Collaborator

oleibman commented Feb 4, 2023

I don't think it would be practical to use PhpSpreadsheet if mbstring is not enabled. Mbstring functions are directly called 45 times in 38 different modules. In addition, StringHelper methods, many of which use mbstring functions, are invoked in 186 different modules.

@MarkBaker
Copy link
Member

I don't think it would be practical to use PhpSpreadsheet if mbstring is not enabled. Mbstring functions are directly called 45 times in 38 different modules. In addition, StringHelper methods, many of which use mbstring functions, are invoked in 186 different modules.

It would be a very limited subset of functionality, but not impossible: for character conversion, we try to use iconv where possible, with mb_ functions as a fallback

We also have our own mbStrSplit function implemented in the StringHelper class

@MarkBaker
Copy link
Member

I see that you've added some unit tests; but I don't see any tests that demonstrate there is a bug unless this change to use mb_str_split instead of str_split is made. Perhaps there are cases where it might be a problem in identifying the csv separator; but I can't find any situation where using str_split is a problem in the Excel function implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants